Running Individual Documentation Sections

This notebook guides you through using the run_documentation_tests() function within the ValidMind Developer Framework for targeted testing. The function is designed to enable developers to run tests on individual sections or specific groups of sections in their model documentation.

As a model developer, running individual documentation sections is useful in various development scenarios. For instance, when updates are made to a model, often only certain parts of the documentation require revision. The run_documentation_tests() function allows you to directly test only these affected sections, thus saving you time and ensuring that the documentation accurately reflects the latest changes.

This guide includes the code required to:

  • Load the demo dataset
  • Prepocess the raw dataset
  • Train a model for testing
  • Initialize ValidMind objects
  • Run the data preparation documentation section
  • Run the model development documentation section
  • Run multiple documentation sections

Before you begin

New to ValidMind?

For access to all features available in this notebook, create a free ValidMind account.

Signing up is FREE — Sign up now

If you encounter errors due to missing modules in your Python environment, install the modules with pip install, and then re-run the notebook. For more help, refer to Installing Python Modules.

Install the client library

%pip install -q validmind
WARNING: You are using pip version 22.0.3; however, version 24.0 is available.
You should consider upgrading via the '/Users/andres/code/validmind-sdk/.venv/bin/python3 -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.

Initialize the client library

ValidMind generates a unique code snippet for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

  1. In a browser, log into the Platform UI.

  2. In the left sidebar, navigate to Model Inventory and click + Register new model.

  3. Enter the model details, making sure to select Binary classification as the template and Marketing/Sales - Attrition/Churn Management as the use case, and click Continue. (Need more help?)

  4. Go to Getting Started and click Copy snippet to clipboard.

Next, replace this placeholder with your own code snippet:

# Replace with your code snippet

import validmind as vm

vm.init(
    api_host="https://api.prod.validmind.ai/api/v1/tracking",
    api_key="...",
    api_secret="...",
    project="...",
)
2024-04-10 17:31:54,621 - INFO(validmind.api_client): Connected to ValidMind. Project: [Int. Tests] Customer Churn - Initial Validation (cltnl29bz00051omgwepjgu1r)
%matplotlib inline

import xgboost as xgb

Preview the documentation template

A template predefines sections for your model documentation and provides a general outline to follow, making the documentation process much easier.

You will upload documentation and test results into this template later on. For now, take a look at the structure that the template provides with the vm.preview_template() function from the ValidMind library and note the empty sections:

vm.preview_template()

Load the Demo Dataset

# You can also import taiwan_credit like this:
# from validmind.datasets.classification import taiwan_credit as demo_dataset
from validmind.datasets.classification import customer_churn as demo_dataset

df = demo_dataset.load_data()

Prepocess the raw dataset

train_df, validation_df, test_df = demo_dataset.preprocess(df)

Train a model for testing

We train a simple customer churn model for our test.

x_train = train_df.drop(demo_dataset.target_column, axis=1)
y_train = train_df[demo_dataset.target_column]
x_val = validation_df.drop(demo_dataset.target_column, axis=1)
y_val = validation_df[demo_dataset.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)
XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, early_stopping_rounds=10,
              enable_categorical=False, eval_metric=['error', 'logloss', 'auc'],
              feature_types=None, gamma=None, gpu_id=None, grow_policy=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=None,
              max_leaves=None, min_child_weight=None, missing=nan,
              monotone_constraints=None, n_estimators=100, n_jobs=None,
              num_parallel_tree=None, predictor=None, random_state=None, ...)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Initialize ValidMind objects

We initize the objects required to run test suites using the ValidMind framework.

vm_dataset = vm.init_dataset(
    input_id="raw_dataset",
    dataset=df,
    target_column=demo_dataset.target_column,
    class_labels=demo_dataset.class_labels,
)

vm_train_ds = vm.init_dataset(
    input_id="train_dataset",
    dataset=train_df,
    type="generic",
    target_column=demo_dataset.target_column,
)

vm_test_ds = vm.init_dataset(
    input_id="test_dataset",
    dataset=test_df,
    type="generic",
    target_column=demo_dataset.target_column,
)

vm_model = vm.init_model(model, input_id="model")
2024-04-10 17:31:56,119 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-10 17:31:56,358 - INFO(validmind.client): The 'type' argument to init_dataset() argument is deprecated and no longer required.
2024-04-10 17:31:56,358 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-10 17:31:56,406 - INFO(validmind.client): The 'type' argument to init_dataset() argument is deprecated and no longer required.
2024-04-10 17:31:56,406 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...

Assign predictions to the datasets

We can now use the assign_predictions() method from the Dataset object to link existing predictions to any model. If no prediction values are passed, the method will compute predictions automatically:

vm_train_ds.assign_predictions(
    model=vm_model,
)
vm_test_ds.assign_predictions(
    model=vm_model,
)
2024-04-10 17:31:56,497 - INFO(validmind.vm_models.dataset): Running predict()... This may take a while
2024-04-10 17:31:56,499 - INFO(validmind.vm_models.dataset): Running predict()... This may take a while

Run the data preparation section

In this section, we focus on running the tests within the data preparation section of the model documentation. After running this function, only the tests associated with this section will be executed, and the corresponding section in the model documentation will be updated.

results = vm.run_documentation_tests(
    section="data_preparation",
    inputs={
        "dataset": vm_dataset,
    },
)

Run the model development section

In this section, we focus on running the tests within the model development section of the model documentation. After running this function, only the tests associated with this section will be executed, and the corresponding section in the model documentation will be updated.

results = vm.run_documentation_tests(
    section="model_development",
    inputs={
        "dataset": vm_train_ds,
        "model": vm_model,
        "datasets": (vm_train_ds, vm_test_ds),
    },
)

Run multiple model documentation sections

This section demonstrates how you can execute both the data preparation and model development sections using run_documentation_tests(). After running this function, the tests associated with both sections will be executed, and their corresponding model documentation sections updated.

results = vm.run_documentation_tests(
    section=["model_development", "model_diagnosis"],
    inputs={
        "dataset": vm_test_ds,
        "model": vm_model,
        "datasets": (vm_train_ds, vm_test_ds),
    },
)